Skip to content

ENH: Implemented MultiIndex.searchsorted method ( GH14833) #61435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

GSAUC3
Copy link

@GSAUC3 GSAUC3 commented May 12, 2025

@datapythonista
Copy link
Member

Thanks @GSAUC3 for the PR. Is there and issue, or has there been any discussion about this elsewhere?

@GSAUC3
Copy link
Author

GSAUC3 commented May 12, 2025

Hi @datapythonista, this was the issue #14833 , against which i made the pull request.

@GSAUC3
Copy link
Author

GSAUC3 commented May 12, 2025

HI, I had ran pytest and pre-commit locally, is it possible to run all these test locally?

@RB-zentronlabs
Copy link

@GSAUC3 Add a return statement in your function after except block.

  • check teh docstring and proper typing hints, IndexOpsMixin check this class, where searchsorted has a properly defined parameter structure.

@GSAUC3
Copy link
Author

GSAUC3 commented May 15, 2025

hi, @datapythonista , I am having trouble running all these tests, locally, before committing my code, so far i have only ran the pytest, locally, and that worked, could you please, guide me, how to set up the testing environment locally, before each commit?

will running
pre-commit run --all-files
suffice?

@datapythonista
Copy link
Member

pre-commit should run automatically if it's set up to work as intended. You have all the information on how to set up the development environment, run tests... in the development documentation: https://pandas.pydata.org/docs/development/index.html

@RB-zentronlabs
Copy link

RB-zentronlabs commented May 15, 2025

Hi, @datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?

=================================== FAILURES ===================================
__________________________ test_searchsorted[tuples] ___________________________
[gw0] darwin -- Python 3.10.17 /Users/runner/micromamba/envs/test/bin/python3.10
[XPASS(strict)] np.searchsorted doesn't work on pd.MultiIndex: GH 1[48](https://github.com/pandas-dev/pandas/actions/runs/15033468287/job/42250710830?pr=61435#step:5:52)33
___________________ test_searchsorted[mi-with-dt64tz-level] ____________________
[gw0] darwin -- Python 3.10.17 /Users/runner/micromamba/envs/test/bin/python3.10
[XPASS(strict)] np.searchsorted doesn't work on pd.MultiIndex: GH 14833
___________________________ test_searchsorted[multi] ___________________________

@datapythonista
Copy link
Member

Hi, @datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?

Yes, that's correct. I guess we have an xfail for the test that should be removed.

@GSAUC3
Copy link
Author

GSAUC3 commented May 17, 2025

Hi @datapythonista . Thank you for your suggestions, I've addressed the feedback from earlier and the CI checks are now passing. This PR should be ready for review whenever you get a chance. Please let me know if any changes are required. Thanks again!

@GSAUC3 GSAUC3 changed the title Searchsorted branch Implemented MultiIndex.searchsorted method (bug #14833) May 17, 2025
@GSAUC3 GSAUC3 changed the title Implemented MultiIndex.searchsorted method (bug #14833) ENH: Implemented MultiIndex.searchsorted method ( GH14833) May 17, 2025
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added few comments.

You will have to add a not in the whatsnew for 3.0. You can check other PRs to see how we do it.

Comment on lines 150 to 157
# if isinstance(obj, pd.MultiIndex):
# # See gh-14833
# request.applymarker(
# pytest.mark.xfail(
# reason="np.searchsorted doesn't work on pd.MultiIndex: GH 14833"
# )
# )
if obj.dtype.kind == "c" and isinstance(obj, Index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this instead of commenting please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I will do it right away.

.gitignore Outdated
@@ -141,3 +141,5 @@ doc/source/savefig/
# Pyodide/WASM related files #
##############################
/.pyodide-xbuildenv-*

*.ipynb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to remove this. We do have some notebooks in this repo iirc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will remove it.

@@ -1029,3 +1029,28 @@ def test_get_loc_namedtuple_behaves_like_tuple():
assert idx.get_loc(("i1", "i2")) == 0
assert idx.get_loc(("i3", "i4")) == 1
assert idx.get_loc(("i5", "i6")) == 2


def test_searchsorted():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to divide this test. Ideally, when a test fails, we want to know what's wrong just by checking the test name, and also we want to still test everything else. With a single test, the first thing that fails will make the whole test fail. You can use pytest fixtures and parametrize to not repeat too much code.

Copy link
Author

@GSAUC3 GSAUC3 May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, it is quite insightful for me. I have created three different functions as follows:

def test_searchsorted_single():... # for single inputs

def test_searchsorted_lists():... # list of single inputs

def test_searchsorted_invalid():... # for invalid inputs

Would this be the correct way to do it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that may seem silly, but then when a test fails it's very easy to know what it fails. Ideally an assert per test, and if more than one input is asserted, then pytest.mark.parametrize to reuse a test with multiple inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiarray searchsorted fails
3 participants